Production Data Export and Archiving System for New Data Format of the Babar Experiment
نویسندگان
چکیده
BaBar has recently moved away from using Objectivity/DB for its event store toward a ROOT-based one. Data in the new format is produced at over two dozen institutions worldwide including SLAC. Among new challenges is the organization of data export from remote institutions, its archival at SLAC and bringing it to the users for analysis or import to their own institutions. The new system is designed to be scalable, easily configurable on a client and server side and adaptive to server load. It is integrated to work with BaBar’s mass storage system (HPSS) and with xrootd service [1]. Design, implementation, experience with the new system, as well as future developments are discussed in this article. BABAR DATA PRODUCTION The BaBar Experiment based at Stanford Linear Accelerator Center has been taking data since May 1999. Initial design of the event store utilized Objectivity/DB [2] as the database technology [3]. Later, a ROOT-based event store, called Kanga, was developed and used in parallel in analysis only. Since the beginning of Run 4 (Nov 2003), a new ROOT-based event store, was developed and used as the primary event store of the experiment [4]. Almost all the data stored in Objectivity/DB has been converted to the new format. The volume of data accumulated in Objectivity/DB federations as of today is 931 TB. Over 161 TB are produced in the new Kanga format, including converted Objectivity data, initial reconstruction of Run 4 data, and skimming of Run 1-4 data. Over 260 TB of disk space is used for production and analysis at SLAC. Distributed Production Since the beginning of the experiment, data has been produced not only at SLAC, but at other institutions. Initially, the Monte Carlo production was done at SLAC and LLNL. Later, IN2P3 (France) and several other institutions started to contribute to the Simulation Production. Today, with over 1 PetaByte of data, the production runs at over two dozen institutions in the US, Canada and Europe. Nowdays, off-site data production is not limited to simulation. INFN (Padova) does all the initial event reconstruction. IN2P3, INFN and GridKa (Karlsruhe, Germany) contribute to skimming. Past Experience with Data Management Distributed production efforts in BaBar brought some challenges in data management. Initially, data from IN2P3 was shipped to SLAC on tapes via commercial postal services. Shipment delays and manual handling of such imported data at SLAC, as well as improvements in network connections, led to switching to network only import/export. To improve transfer speed and network utilization bbcp [5], high performance multistream copy utility, was developed at SLAC. Another complication of the import/export procedure was the specifics of Objectivity/DB OODBMS and the event store design based on it. Database files had to be registered in the Objectivity federation catalog. During this process, referred to as attaching a database to a federation, the database file was scanned by Objectivity/DB administration utility and any data corruption halted the process, requiring intervention of a database administrator. Also, new collections had to be loaded to the collection tree (application level catalog). Both operations required locking of the federation’s critical metadata, which interfered with the users’ analysis activities. Sometimes a stubborn lock from another application prevented the import from completion until the lock was removed. Scalability problem of Objectivity federation, which manifested itself in growing metadata access time as the federation got larger, also affected delivery of data to physicists. To ease the burden of lock conflicts in analysis federations, import of off-site data was done in dedicated import federations, where databases were initially attached. Subsequently, databases were attached to the analysis federations without a new scan, reducing locking time. Collections were loaded directly to the analysis federations. Nevertheless such a procedure was not used for SLAC production, where delivery time was more critical, and it was easier to solve all problems within the same site. As mentioned above, data corruption was one of the major issues for data import since Objectivity/DB tools were limited to checking for corruption at a database level. The ollections could only be checked with standard physics analysis applications, which was unacceptable because of time and resources such operations required. After importing into Objectivity/DB federation the data had to be archived in SLAC mass storage system (HPSS by IBM.) This system, while offering excellent reliability and scalability to over 1 PB of data, has its own disadvantages. HPSS uses a proprietary code, thus tape mount order and file read and write order could not be controlled. Additional Figure 1: Sharing concerns and responsibilities between subsystems. disk space had to be acquired to reduce the load on HPSS and the staging time. To summarize, import/export problems included: Absence of efficient higher level tools to check data consistency Very little automation of import error handling Different import procedures for data produced at SLAC and at remote sites Inability to control HPSS to a desirable degree. Motivation for New Development Some of the problems metioned above have been addressed in the design of the new ROOT-based event store. Since there are no more federations, data files have only to be put in the mass storage and in the analysis area where users can access them. There are no corruption issues, because data consistency is checked with a special tool before leaving the production site. To take advantage of the new event store design and resolve remaining issues, new transfer/import tools were needed. Among the requirements for the new transfer tool were: Full automation of data transfer and archiving Unification of all export and archiving procedures Reduction of human involvement in error handling HPSS-friendly system Low resource utilization, with focus on disk bandwidth Protocol level backward compatibility with Objy transfer/import tools Streamlined procedure for further data processing Assisting with exporting data to remote institutions DEVELOPMENT OF A NEW TOOL During the development of the new tool, focus was not on designing yet another full-blown Storage Resource Manager (SRM) or metadata catalog, but on making it as simple as possible, while implementing the basic ideas discussed below. Sharing Responsibilities and Concerns between Subsystems The first idea is illustrated on Fig. 1. The goal is to modularize transfer/import and define responsibilities of each subsystem. In our schema, the Production subsystem is responsible for checking data for quality and consistency. If data corruption is detected at a later stage, this subsystem will have to deal with the issue. Transfer subsystem is responsible for delivering of data from production site to the import servers. It verifies checksum after transfer and keeps the necessary metadata about transferred files. Archiving subsystem is responsible for saving files in the Mass Storage System. It is usually very site-specific and has its own protocol and policies, making it difficult to use out-of-box tools. The role of Import subsystem is very simple in this schema. It processes files after the transfer, preparing them for archiving and hands them over to the Archiving subsystem. Preparation may include changing file ownership, placing files in migratable space and complying with other MSS protocols. It is worth emphasizing, that transfer is detached from import, making the whole procedure less susceptible to any failure that might occur during WAN transfers or even at the remote site. Figure 2: Central management. Push-Pull model of data transfer As SLAC is the only site that archives all the produced data in the Mass Storage, and provides all the data to end users, it would be natural to have a SLAC-centric data distribution model. Data management at SLAC is provided by the Computing Department. All services are expected to be always available and problems are expected to be solved in a timely manner. Therefore we have chosen a transfer model, where remote producers “push” data into SLAC. In this model, local problems at multiple remote institutions are no concern for SLAC administration, and only local administrators are dealing with them. However, a problem that occurrs at SLAC, would most likely affect all data exporters and would be fixed by an administrator on duty. On the other hand, the data that needs to be exported to a remote site for analysis by local physicists is “pulled” from SLAC. This is again done for the same reasons — sites initiate transfer when they are ready, and may choose to implement a protocol at their end that suits them.
منابع مشابه
Simulated Events Production on the Grid for the BaBar Experiment
The BaBar experiment uses data since 1999 in examining the violation of charge and parity (CP) symmetry in the field of high energy physics. This event simulation experiment is a compute intensive task due to the complexity of the Monte-Carlo simulation implemented on the GEANT engine. Data needed as input for the simulation (stored in the ROOT format), are classified into two categories: condi...
متن کاملAssessing the Security Status of Picture Archiving and Communication Systems (PACS) of Kerman University of Medical Sciences to Record Data of COVID-19 Patients
Introduction: Today, the use of picture archiving and communication system (PACS) in health care centers is increasing. Meanwhile, the security of data of patients with COVID-19 in this system and control of access to such data is considerably significant. Therefore, this study aimed to evaluate the security of medical picture archiving and communication systems for recording data of COVID-19 p...
متن کاملبررسی الگوی خود-آرشیوی نویسندگان ایرانی: مقایسه حوزه های علوم و علوم اجتماعی
Purpose: This research is devoted to find out the level of self-archival pattern among Iranian researchers with high rate of publications, based upon the ISI citation indices in the scopes of science and social sciences. Methodology: This research is a descriptive survey based on observation. The necessary data was obtained through observing the websites of the 80 Iranian researchers with high...
متن کاملAssessing the Security Status of Picture Archiving and Communication Systems (PACS) of Kerman University of Medical Sciences to Record Data of COVID-19 Patients
Introduction: Today, the use of picture archiving and communication system (PACS) in health care centers is increasing. Meanwhile, the security of data of patients with COVID-19 in this system and control of access to such data is considerably significant. Therefore, this study aimed to evaluate the security of medical picture archiving and communication systems for recording data of COVID-19 p...
متن کاملUsers’ satisfaction with imaging services before and after the implementation of picture archiving and communication system
Introduction: The picture archiving and communication system is a digital device designed for processing, archiving and communicating medical images with different parts of hospitals, physicians and radiologists. Therefore, the current study aimed to determine the impact of the system on users’ satisfaction with imaging services before and after its implementation. Methods: This cross-secti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004